class: center, middle, inverse, title-slide .title[ # Statistical Programming in R: Lecture 1 ] .subtitle[ ## Course Preliminaries, and Introduction to R and RStudio ] .author[ ### Josemari Feliciano ] .institute[ ### DATA 612 - American University ] .date[ ### Fall 2025 (August 26, 2025) ] --- ## Today's Agenda <style type="text/css"> .tiny .remark-code { /*Change made here*/ font-size: 70% !important; } .extra-tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> 1. Go over the syllabus 2. Introduction to R and RStudio 3. Installing R and RStudio 4. Basic R syntax and programming 5. Hands-on practice --- class: center, middle ## Go over the syllabus ### Let us switch to Canvas where a copy of the course syllabus is located. --- ## Class Format Starting Lecture 2, the class format will generally follow the following format: - I will lecture for the first 90 mins (approximately). - Take a 10 min break. - Assigns practice problems to class so you can practice what was taught in lecture. - Go over the practice problems as a class. Completed problems posted as a guide to help you complete future homework. --- ## Introduction to R and RStudio __What is R?__ - R is the open-source statistical language that seems to have taken over the world of statistics and data science. R is really more than a statistical package - it is a language or an environment designed to produce statistical analysis and production of high quality graphics. - Originally developed by two statisticians at the University of Auckland as a dialect of the S statistical language. - R is both open-source and open development. For more on information, see www.r-project.org/contributors.html --- class: center, middle <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/R.png" alt="Figure 1. A screenshot of how R looks like in MacOS. Note: You will actually never work with R directly. You will work with R using RStudio." width="100%" /> <p class="caption">Figure 1. A screenshot of how R looks like in MacOS. Note: You will actually never work with R directly. You will work with R using RStudio.</p> </div> --- ## Introduction to R and RStudio __Why learn R?__ - R is a powerful and flexible, free (open source) language designed specifically for statistical computing. - There is an extensive collection packages created by R users to extend R and implement modern statistical techniques. - Furthermore, R is an interpreted, high level language, which means that we can write code and run it in real time line by line without needing to worry about low level programming such as memory management. --- ## Introduction to R and RStudio __What is RStudio?__ - RStudio is an integrated development environment<sup>1</sup>, or IDE, for R programming, which you can download from https://posit.co/download/rstudio-desktop/. - RStudio helps R users to effectively use R by making things easier. __One example on next page__ - RStudio is updated a couple of times a year, and it will automatically let you know when a new version is out, so there’s no need to check back. It’s a good idea to upgrade regularly to take advantage of the latest and greatest features. .footnote[ [1] IDEs are tools designed to increase programmer productivity by combining common activities of writing software into a single application: editing source code, building executables, and debugging. ] --- class: middle <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/RStudio.png" alt="Figure 2. A screenshot of how RStudio looks like in MacOS." width="100%" /> <p class="caption">Figure 2. A screenshot of how RStudio looks like in MacOS.</p> </div> --- ## Why learn both R and RStudio? Both tools are widely used by scientists, academics, data analysts, and data scientists. According to Glassdoor (as of June 6, 2024): - The median total yearly pay for data analysts in Washington, DC is $107,000. - The median total yearly pay for data scientists in Washington, DC is $183,000. In my old team at the United States Dept of Agriculture, data analysts and data scientists are currently making `$`117,962 - `$`153,354 in 2025. In my current team at another federal agency, data scientists are currently making `$`139,395 - `$`181,216 in 2025. --- class: center, middle ## My personal experience with R ### Some examples of past work that leveraged both R and RStudio --- class: center, middle ## Installing R and RStudio ### Let us switch to Canvas where a copy of installation instructions is located. ### We will spend up to 20 mins to ensure both R and RStudio are installed into your computer. --- class: center, middle ## RStudio Basics --- ## RStudio Basics __Warning:__ This class is an applied data science class. You will get a lot of practice. However, today's class is full of definitions or terminologies that you will gain familiarity with throughout the semester. You don't necessarily need to memorize most of the forthcoming terminologies. Although you will have some practice today, I see Lecture 2 next week as the real first class where you actually code properly. </br> __Today's penultimate slide will summarize key takeaways once we've done some practice after today's exercise.__ --- ## RStudio Basics There are four 'panes' or windows in RStudio that we generally use. After your immediate installation, you may only see three (more on this later). But once you start writing and saving R scripts, you will regularly interact with all four panes. __These panes are:__ - Environment Pane. - Console Pane. - Files Pane. - Source Pane. --- class: middle <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/lecture1environmentpane.png" alt="Figure 3. This is called the Environment Pane from RStudio which allows users to track which variables or data have been saved into the R environment. More on this later." width="100%" /> <p class="caption">Figure 3. This is called the Environment Pane from RStudio which allows users to track which variables or data have been saved into the R environment. More on this later.</p> </div> --- class: center, middle <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/lecture1consolepane.png" alt="Figure 4. This is called the Console Pane from RStudio (Linux Version) which allows users to type in and execute scripts." width="100%" /> <p class="caption">Figure 4. This is called the Console Pane from RStudio (Linux Version) which allows users to type in and execute scripts.</p> </div> --- class: center, middle <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/Lecture1Typing.png" alt="Figure 5. Here is an example of a simple script for addition. Type '4+2' then press Enter/return in the Console Pane." width="100%" /> <p class="caption">Figure 5. Here is an example of a simple script for addition. Type '4+2' then press Enter/return in the Console Pane.</p> </div> --- ## Brief Detour: Basic Arithmetic Operators in R Since we are discussing running basic simple R scripts: <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/lecture1operation.png" alt="Figure 6. Five basic arithmetic operators you can perform in R." width="60%" /> <p class="caption">Figure 6. Five basic arithmetic operators you can perform in R.</p> </div> __Note:__ `+` is addition; `-` is substraction; `*` is multiplication; `/` is division; and `**` or `^` is exponentiation. --- # Console Pane <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/Lecture1Typing.png" alt="Figure 5. Here is an example of a simple script for addition. Type '4+2' then press Enter/return in the Console Pane." width="40%" /> <p class="caption">Figure 5. Here is an example of a simple script for addition. Type '4+2' then press Enter/return in the Console Pane.</p> </div> Throughout this course, we will rarely type in and execute/run scripts from the console pane. Generally, you want to save scripts you generate and execute within an R file (more on this later). Today is one of those exemptions. Generally, we will run scripts in the console pane to install packages. For now, you may think of packages as a collection of tools to increase productivity and do specific tasks (i.e., certain packages can help you create maps). Within the next few slides, we will install packages that you will need for the first four weeks of the course. --- # The tidyverse - An R package is a collection of functions, data, and documentation that extends the capabilities of base R. - Using packages is key to the successful use of R. The majority of the packages that you will learn in this course are part of the so-called tidyverse. - All packages in the tidyverse share a common philosophy of data and R programming and are designed to work together. --- # Install the tidyverse packages Type in then execute (by pressing enter/return) this code within your console pane: install.packages("tidyverse") You only need to install this once. If you've used R previously, it is possible you might have it already. Once you have tidyverse installed, you need to load the package each time you start a new R session. <br> More generally, you need to run this script for installing packages: install.packages("[fill in package name]") --- # Loading packages in R To load a package, you need to run/execute this template script: `library([fill in package name])` Note: No quotation symbols when loading a package. __Again, you need to load the package each time you start a new R session.__ After loading packages in R, you are then allowed to use programming 'tools' included within each package to increase your productivity and perform highly specialized tasks. This may seem trivial for now but you will get a lot of practice throughout the course. This is how you load the tidyverse package into R: ``` r library(tidyverse) ``` --- # Files Pane <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/FilesPane.png" alt="Figure 9. A screenshot of the Files pane. We will keep revisiting the Files pane throughout the semester." width="50%" /> <p class="caption">Figure 9. A screenshot of the Files pane. We will keep revisiting the Files pane throughout the semester.</p> </div> --- # Source Pane: Missing <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/lecture1threepanes.png" alt="Figure 10. Three panes you see upon opening RStudio. Initially, it excludes a fourth pane called source pane." width="70%" /> <p class="caption">Figure 10. Three panes you see upon opening RStudio. Initially, it excludes a fourth pane called source pane.</p> </div> --- # Source Pane: Creating and saving R Scripts <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/lecture1openingfourthpane.png" alt="Figure 11. The fourth pane (source pane) will appear when you create a new file called R Script (or load an existing R file)." width="60%" /> <p class="caption">Figure 11. The fourth pane (source pane) will appear when you create a new file called R Script (or load an existing R file).</p> </div> __For practice (live demo):__ Click File > New File > RScript. Within the file, type in one of the scripts you learned (e.g., one of the five basic arithmetic operators). Once you are done: Click File > Save As. Name the file however you want and save it within the location that you can remember. Close RStudio. And try double clicking the file from the location where you saved it. --- <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/lecture1rscript.png" alt="Figure 12. What you will see once you have a saved loaded file and after running script within them." width="90%" /> <p class="caption">Figure 12. What you will see once you have a saved loaded file and after running script within them.</p> </div> Note: To run a script from an RScript file, click anywhere on line 1 (or highlight the code you want to run), and press the 'Run' button on the upper right corner of the source pane. --- # Source pane: Creating and saving R Scripts __A side note:__ Although I am teaching you how to create RScript (i.e., File > New File > RScript), we will be creating and using Quarto notebooks (more on this next week) throughout this semester. --- # Commenting your R Scripts/Code Comments can be used to explain R code, and to make it more readable. Comments starts with a #. When executing code, R will ignore anything that starts with #. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/CommentExample.png" alt="Figure 13. An example of a commented code in R. Important Note: The red 'Untitled1' implies this script is unsaved so make sure to always save your scripts." width="90%" /> <p class="caption">Figure 13. An example of a commented code in R. Important Note: The red 'Untitled1' implies this script is unsaved so make sure to always save your scripts.</p> </div> --- ## Data Types in R: A focus on data frame (tabular data) There are data types in R that we will never use. Moreover, this is not a comprehensive programming course in R. The one data type that we will commonly use and manipulate throughout the semester is called __data frame__. A data frame is a data structure constructed with rows and columns, similar to a nicely structured Excel spreadsheet or Google sheets. I may sometimes refer to this as tabular data. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/TabularExample.png" alt="Figure 14. An example of tabular data in Excel. When loaded into R, this will be read as a data frame." width="60%" /> <p class="caption">Figure 14. An example of tabular data in Excel. When loaded into R, this will be read as a data frame.</p> </div> --- ## Functions __What Is a Function in R?__ A function in R is one of the most used objects. It is an executable code that will perform certain tasks. `library()` is an example of a function you were briefly introduced to in earlier slides. It is an R code that allows you to load a package. `library(tidyverse)` leverages the library function to load the tidyverse package into R. The file we will open and go through for today's hands-on practice will introduce you to other functions in R that works with data frames. --- ## Functions: Some examples. `log()` is an R function that takes logarithms of numbers you feed into it. Note: log() is technically ln() ``` r # calculates ln(10) log(10) ``` ``` ## [1] 2.302585 ``` </br> `exp()` is another R function that computes the exponential value of the number you feed into it. For example, exp(2) is equivalent to calculating `\(e^{2}\)`. ``` r # calculates exp(2) exp(2) ``` ``` ## [1] 7.389056 ``` --- ## Functions: Base R. __Terminology:__ Functions such as `exp()` and `log()` are functions from what is called base R. Base R refers to built-in tools from the default installation of R. The focus of this class, however, is the use of functions from packages to perform highly specialized tasks. Lectures 2-4 for example uses tidyverse functions for data visualization and manipulation. --- class: center, middle ## Hands-on Practice 1 ### Download Exercise1.R from Canvas, then follow demo provided in class using your computer. Today's exercise aims to give you practice on how to run scripts and give you more familiarity with RStudio. --- ## Key Takeaways from Exercise1.R This is a pattern we will generally use throughout the semester: - We generally start our R code by loading package(s) we need (e.g., `library(tidyverse)`) at the very beginning. - After loading all the package(s) needed, we will load all the data frame(s) needed into R. In the next few lectures, we will be using what are called built-in data sets from R packages. I will give you the name of the built-in data set (e.g., mpg), then you have to load it into R by running: `data([name I will give you])`. In Exercise1.R, we ran `data(mpg)`. - Visualize or understand structure of the data frame (e.g., `head(mpg)`, `glimpse(mpg)`). - Perform analysis (or create data visualization) using functions. </br> __Next week:__ We will be creating basic data visualizations. --- class: center, middle ## Homework will be posted on Canvas around the evening of August 29th (Friday). It will be due on September 5 (Friday) by 11:59pm.